13 research outputs found

    Social Media Analysis for Social Good

    Get PDF
    Data on social media is abundant and offers valuable information that can be utilised for a range of purposes. Users share their experiences and opinions on various topics, ranging from their personal life to the community and the world, in real-time. In comparison to conventional data sources, social media is cost-effective to obtain, is up-to-date and reaches a larger audience. By analysing this rich data source, it can contribute to solving societal issues and promote social impact in an equitable manner. In this thesis, I present my research in exploring innovative applications using \ac{NLP} and machine learning to identify patterns and extract actionable insights from social media data to ultimately make a positive impact on society. First, I evaluate the impact of an intervention program aimed at promoting inclusive and equitable learning opportunities for underrepresented communities using social media data. Second, I develop EmoBERT, an emotion-based variant of the BERT model, for detecting fine-grained emotions to gauge the well-being of a population during significant disease outbreaks. Third, to improve public health surveillance on social media, I demonstrate how emotions expressed in social media posts can be incorporated into health mention classification using an intermediate task fine-tuning and multi-feature fusion approach. I also propose a multi-task learning framework to model the literal meanings of disease and symptom words to enhance the classification of health mentions. Fourth, I create a new health mention dataset to address the imbalance in health data availability between developing and developed countries, providing a benchmark alternative to the traditional standards used in digital health research. Finally, I leverage the power of pretrained language models to analyse religious activities, recognised as social determinants of health, during disease outbreaks

    Multi-task Learning for Personal Health Mention Detection on Social Media

    Full text link
    Detecting personal health mentions on social media is essential to complement existing health surveillance systems. However, annotating data for detecting health mentions at a large scale is a challenging task. This research employs a multitask learning framework to leverage available annotated data from a related task to improve the performance on the main task to detect personal health experiences mentioned in social media texts. Specifically, we focus on incorporating emotional information into our target task by using emotion detection as an auxiliary task. Our approach significantly improves a wide range of personal health mention detection tasks compared to a strong state-of-the-art baseline.Comment: 5 page

    Incorporating Emotions into Health Mention Classification Task on Social Media

    Full text link
    The health mention classification (HMC) task is the process of identifying and classifying mentions of health-related concepts in text. This can be useful for identifying and tracking the spread of diseases through social media posts. However, this is a non-trivial task. Here we build on recent studies suggesting that using emotional information may improve upon this task. Our study results in a framework for health mention classification that incorporates affective features. We present two methods, an intermediate task fine-tuning approach (implicit) and a multi-feature fusion approach (explicit) to incorporate emotions into our target task of HMC. We evaluated our approach on 5 HMC-related datasets from different social media platforms including three from Twitter, one from Reddit and another from a combination of social media sources. Extensive experiments demonstrate that our approach results in statistically significant performance gains on HMC tasks. By using the multi-feature fusion approach, we achieve at least a 3% improvement in F1 score over BERT baselines across all datasets. We also show that considering only negative emotions does not significantly affect performance on the HMC task. Additionally, our results indicate that HMC models infused with emotional knowledge are an effective alternative, especially when other HMC datasets are unavailable for domain-specific fine-tuning. The source code for our models is freely available at https://github.com/tahirlanre/Emotion_PHM

    Language as a latent sequence: Deep latent variable models for semi-supervised paraphrase generation

    Get PDF
    This paper explores deep latent variable models for semi-supervised paraphrase generation, where the missing target pair for unlabelled data is modelled as a latent paraphrase sequence. We present a novel unsupervised model named variational sequence auto-encoding reconstruction (VSAR), which performs latent sequence inference given an observed text. To leverage information from text pairs, we additionally introduce a novel supervised model we call dual directional learning (DDL), which is designed to integrate with our proposed VSAR model. Combining VSAR with DDL (DDL+VSAR) enables us to conduct semi-supervised learning. Still, the combined model suffers from a cold-start problem. To further combat this issue, we propose an improved weight initialisation solution, leading to a novel two-stage training scheme we call knowledge-reinforced-learning (KRL). Our empirical evaluations suggest that the combined model yields competitive performance against the state-of-the-art supervised baselines on complete data. Furthermore, in scenarios where only a fraction of the labelled pairs are available, our combined model consistently outperforms the strong supervised model baseline (DDL) by a significant margin ( ; Wilcoxon test). Our code is publicly available at https://github.com/jialin-yu/latent-sequence-paraphrase

    INTERACTION: A Generative XAI Framework for Natural Language Inference Explanations

    Full text link
    XAI with natural language processing aims to produce human-readable explanations as evidence for AI decision-making, which addresses explainability and transparency. However, from an HCI perspective, the current approaches only focus on delivering a single explanation, which fails to account for the diversity of human thoughts and experiences in language. This paper thus addresses this gap, by proposing a generative XAI framework, INTERACTION (explaIn aNd predicT thEn queRy with contextuAl CondiTional varIational autO-eNcoder). Our novel framework presents explanation in two steps: (step one) Explanation and Label Prediction; and (step two) Diverse Evidence Generation. We conduct intensive experiments with the Transformer architecture on a benchmark dataset, e-SNLI. Our method achieves competitive or better performance against state-of-the-art baseline models on explanation generation (up to 4.7% gain in BLEU) and prediction (up to 4.4% gain in accuracy) in step one; it can also generate multiple diverse explanations in step two

    Detecting Fine-Grained Emotions on Social Media during Major Disease Outbreaks: Health and Well-being before and during the COVID-19 Pandemic

    No full text
    The COVID-19 pandemic has affected the whole world in various ways. One type of impact is that communication, work, interaction, a great part of our lives has moved online on various platforms, with some of the most popular being the social media ones. Another, arguably less visible impact, is the emotional impact. Detecting and understanding emotions is important, to better discern the emotional health and well-being of the global population. Thus, in this work, we use a social media platform (Twitter) to analyse emotions in detail. Our contribution is twofold: (1) we propose EmoBERT, a new emotion-based variant of the BERT transformer model, able to learn emotion representations and outperform the state-of-the-art; (2) we provide a fine-grained analysis of the pandemic's effect in a major location, London, comparing specific emotions (annoyed, anxious, empathetic, sad) before and during the epidemic

    Digital Inclusion in Nothern England: Training Women from Underrepresented Communities in Tech: A Data Analytics Case Study

    Get PDF
    The TechUPWomen programme takes 100 women from the Midlands and North of England, particularly from underrepresented communities, with degrees or experience in any subject area, retrains them in technology and upon graduation guarantees an interview with a company. The retraining programme, developed by the Partner Universities in conjunction with the Industrial Partners, has modules at level 6/7 including: Technology: coding, data science, cyber security, machine learning, agile project management; Workplace readiness skills: public speaking, clear communication, working as a team. In this paper, we introduce, for the first time, the TechUPWomen programme, and we analyse its temporal evolution and special features via a data analytics nowcasting approach. Deepening these women’s experience with applied upskilling includes one-to-one mentoring (100-100), strong networking, residentials, close industry connection with two directions (non-technical & technical) and four job-focussed final tracks: business analyst, agile project manager, data scientist, developer. TechUPWomen also has significant representation of traditionally underrepresented communities, with focus on enabling instead of teaching approach. Beside the originality of the unique combination of features of the programme, this is, to the best of our knowledge, the first analysis based on data analytics of a women in tech(nology) retraining programme, based on nowcasting. Results show that the approach is effective; topic analysis shows that frequent topics include joy, BAME, networking, residential, industry, learning

    EXPERT SYSTEM IN RURAL MEDICAL CARE

    No full text
    This paper looks into how an expert system can be used to solve rural problem as it relates to medical care. An expert system is a computer program that simulates the thought process of a human expert to solve complex decision problems in a specific domain. Rural medical care is a health care system found in the rural areas, whose operations are at its poorest state due to lack of sufficient medical practitioners. This research work looks into the areas of rural medical care that could be aided with the use of an expert system that would automate some of the processes and at the same time supplement the few medical officials available in the rural areas in order to improve on the healthcare system. This research considers the manual processes involved in the registration of patients in the rural medical centers, diagnosis and also the schedule of a follow-up appointment. Analysis of the current system was carefully carried out to determine where modification, changes and improvement should be made in the design of the proposed system. Finally, a computer system was designed; which would be used as prototype for further improvement

    Temporal Sentiment Analysis of Learners: Public Versus Private Social Media Communication Channels in a Women-in-Tech Conversion Course

    Get PDF
    Social media is ubiquitous, a continuous part of our daily lives; it offers new ways of communication. This is especially crucial in education, where various online systems make use of (perceived) public or private communication, as a means to support the learning process, often in real-time. However, not much research has been carried out in analysing and comparing such channels and the way participants use them. Thus, this paper analyses a course offering both public and private types of communication to its participants. Participants communicate via two social media channels (beyond traditional email etc.): Twitter (open to the public) and Microsoft Teams (for internal communication). In this paper, we specifically analyse the communication patterns of learners, focusing on the temporal analysis of their sentiments on the public versus the private platform. The comparison shows that, as possibly expected, there exist similarities between expressed sentiment in public and private channels. Interestingly however, the private platform is more likely to be used for negative utterances. It also shows that sentiment can be clearly connected to events in the course (e.g., the residentials increase both volume and positivity of comments). Finally, we propose new measures for sentiment analysis to better express the nature of change and speed of change of the sentiment in the two channels used by our learners during their learning process
    corecore